An intermediate phase in DNA melting 
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We predict a novel temperature-driven phase transition of DNA below the melting transition. The 
additional, intermediate phase exists for repetitive sequences, when the two strands have different 
lengths. In this phase, the excess bases of the longer strand are completely absorbed as bulge loops 
inside the helical region. When the temperature is lowered, the excess bases desorb into overhanging 
ends, resulting in a contour length change. This continuous transition is in many aspects analogous 
to Bose Einstein condensation. Weak sequence disorder renders the transition discontinuous. 



The base-pairing interaction between the two strands 
of DNA is not only pivotal to its biological function , 
but also leads to intriguing applications in nanotechnol- 
ogy Q • One approach to probe this interaction is to mon- 
itor the DNA conformation as a function of temperature. 
Experimentally, one can observe the number of basepairs 
formed (using UV absorption 0,0]), as well as changes of 
intra-molecular distances on the nanometer scale (using 
modern single- molecule techniques d). On the theoreti- 
cal side, the temperature dependence of DNA conforma- 
tions has been studied for almost fifty years, u sing models 
of various degrees of complexity |E SlS 11 13111111111 • 
Particular attention has been paid to the characteristics 
of the melting transition, where the two strands sepa- 
rate completely. Whereas early models yielded only a 
crossover [13 , the Poland-Scheraga (PS) model was 
the first to display a phase transition, albeit a contin- 
uous one, which appeared to be at variance with the 
experimentally observed sharp jump in the fraction of 
bound basepairs \ M ■ Only recently have mechanisms 
been proposed jlOL lllj which yield an abrupt, first or- 
der transition. So far, however, most analyses of DNA 
melting have incorporated only native interactions, i.e. 
base pairs that occur in the ground state of the molecule 
(see 0, El 

for notable exceptions). It is our aim here 
to show that such non-native interactions can introduce 
an intermediate phase in the melting behavior of DNA, 
associated with an additional conformational transition 
before strand separation. 

Non-native interactions are particularly relevant for 
repetitive DNA sequences, which are common in genomes 
|14j . Periodic DNA, with e.g. a single base repeat such 
as TTT ... or a higher order repeat such as CAGCAG . . . , 
can take on basepairing patterns with asymmetric loops 
and the two complementary strands can be shifted rel- 
ative to each other. Here, we consider the general situ- 
ation where the two strands can have arbitrary lengths 
N, M. We describe the DNA using a generalized PS 
model |l3| and calculate its equilibrium behavior analyt- 
ically. We find that for N ^ M, the bound phase splits 
into two separate phases. The low temperature phase is 
characterized by an extensive length of the unbound end 
on the longer strand, whereas in the new intermediate 



phase these overhanging bases are absorbed into the heli- 
cal region. Mathematically, and also conceptually, many 
aspects of this transition are analogous to Bose-Einstein 
condensation (BEC), as "particles" (bases) condense into 
a single "state" (the overhanging end), which thereby 
acquires macroscopic "occupation" (length). Obviously, 
the analogy extends only to the behavior of the partition 
function, as there is no quantum coherence in the DNA 
problem. Effectively, the transition amounts to a temper- 
ature sensitive change in the contour length of the DNA 
molecule, which should be observable with optical or sin- 
gle molecule methods. While the transition is continuous 
for perfectly periodic sequences, we find that it becomes 
a first order transition once (weak) sequence disorder is 
introduced. We also show that the non-native interac- 
tions can change the order of the melting transition, as 
has been conjectured previously |l5|. 

DNA model. — We consider two DNA strands with 
lengths N and M > N, respectively, and describe their 
interaction with a generalized PS model [l3L . Specif- 
ically, a base i < N of the lower strand can form a base 
pair with every complementary base j < M of the 
upper strand, whereas the formation of base pairs within 
a strand can be neglected (since we are interested only 
in sequences with a high degree of complementarity and 
a low degree of self-complementarity). Due to geomet- 
rical constraints, we may neglect the 'crossing' of base 
pairs, e.g. two base pairs and («2, J2) with i\ < 12 

but ji > j2- The basepairing pattern S, i.e. the set of 
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FIG. 1: A possible configuration of two complementary DNA 
strands with a repetitive sequence (a bead represents one re- 
peat unit). Note that repetitive sequences can form base pair- 
ing patterns with asymmetric loops. In general we allow for 
different strand lengths N, M. The last repeat units (squares) 
are permanently bound. 
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all formed base pairs, then creates a DNA conformation 
consisting of bound segments alternating with (possibly 
asymmetric) loops, see Fig.^ To simplify the discussion, 
we enforce the base pair (N, M) at the right end, so that 
we need to consider only one overhanging end. Experi- 
mentally, this boundary condition would be realized e.g. 
by a few particularly strong basepairs at one end. 

To each basepairing pattern S, we assign a statis- 
tical weight Q(«S), which takes the form of a product 
with factors of four different types: (i) a Boltzmann fac- 
tor q = e Eb / kBT for every basepair with binding energy 



Im x 



-£b < 0, (ii) a Boltzmann factor g 



ery loop with loop initiation cost sj> > 0, (iii) an entropic 
factor Bi(m) — s m m~ c for each loop, which is the in- 
crease in the number of polymer configurations when m 
bases form a (floppy) loop instead of being in a (rigid) 
double helical conformation, (iv) and a similar entropic 
factor A(n) = s n n~ c for a single-stranded end of n bases. 
Here, the exponents c, c in the entropic factors are uni- 
versal in that they are independent of the detailed poly- 
mer properties, but are sensitive to excluded volume in- 
teractions. For interacting self-avoiding loops one has 
c » 2.15, while c « 0.1 0|. Whereas the value of c deter- 
mines the critical behavior at the melting transition , 
the non-universal constant s has no qualitative effect on 
the melting behavior (we use s = 10 in all numerical ex- 
amples). In the following, we apply the DNA model to 
perfectly periodic sequences, where each repeat unit can 
be treated as an effective base with renormalized param- 
eters (we use Eh — 6 and Ei = 3 in temperature units, 
ks = 1). We emphasize that our simplistic model for the 
involved energies and entropies is meant to illustrate the 
physical phenomena in a transparent way, but leads to 
an unrealistic temperature scale. With a more detailed 
description Q , we find that all of the interesting behavior 
happens at accessible temperatures (l7j[ . 

Free energy of periodic DNA. — To obtain the equilib- 
rium properties of the DNA model, we calculate the par- 
tition sum over all basepairing patterns, Zff = Q(S). 
By separating the single stranded ends from the double 
stranded part, see Fig.Q] we write Zff as 
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Here, W* is the partition function of two complementary 
and periodic strands of length r and t with the first and 
last base pair formed. W' obeys the recursion relation 



W*+l = qWl + g\ 



fc<r,m<t 

E Mk+m)W^, (2) 

k+m>0 



with the initial conditions W\ = q and W{ — W} — for 
i > 1 HHH. Eqs. fl} and © can be used to calculate 
Zjtf for finite lengths N, M. To extract the thermody- 
namic behavior in the limit of long strands, we take the 
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FIG. 2: The contour integration required for the inverse 
transformation of Z(x,yo) is given by the sum of the inte- 
gral along the branchcut [s _1 , oo] C R and the integral encir- 
cling the singularity of W(x,yo) at x = x*(yo). This isolated 
singularity only exists below the melting temperature. 



z-transform Z(x,y) — ^'n m=o Z n x N y M . Compared 
to the related case of a single self-complementary RNA 
strand folding back onto itself [lil ln| , we need two in- 
stead of one transformation variables here, due to the 
second strand of DNA. One obtains 



Z(x,y) 



A(x)A(y)qxy 



1 - qxy + ^(yB(y) - xB(x)) 



(3) 



where the transforms of the entropic factors are given by 
A(z) — 4>c{sz) + 1 and B(z) — (f> c (sz), with the polylog- 
arithm <f> c {z) — Y^n=i z n n~ c . 

The ^-transformation carried out above amounts to a 
change from the canonical to the grand canonical ensem- 
ble. The transformation variables x, y play the role of 
fugacities for bases in the lower and upper strand, re- 
spectively. However, for the ensuing discussion, it is ad- 
vantageous to keep the length N of the shorter strand 
fixed as a reference. Hence, we perform the inverse trans- 
formation for the lower strand by contour integration in 
x, see Fig. to obtain the partition sum Z/v(yo) for N 
bases on the lower strand and the upper strand coupled 
to a "nucleotide reservoir" with fixed fugacity y - When- 
ever both strands are bound, Z(x, y) has a singularity at 
x*(yo) < s _1 , see Fig. EI F° r large N, the contour inte- 
gration is dominated by the residue at x*(yo), leading to 
Zn(vo) — A(yo)x* (yo)~ N . Hence, the free energy of the 
bound phase is given by Nfb(ya) — T\aA(yo), where the 
first term is the contribution of the helical region with 
a free energy per length fb(yo) = T\nx*(yo), and the 
second term is the contribution from the unbound end of 
the longer strand. The free energy for given N and M is 
then obtained by saddle point integration, 

SMl = _U W «*> + « hW ,4) 

where the fugacity yo is determined by 



M = (M) yo = y 



din A(y Q 
dyo 



— N 



yo dfbjyo) 

T dy 



(5) 



Phase diagram. — To extract the physical behavior of 
the DNA model from Eqs. J3J and JSJ, we focus on two 
observables, the total number of base pairs, N6, and the 
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FIG. 3: Top: The length of the unbound end, normalized 
by the total length N of the shorter strand. For finite sys- 
tems (N = 1000, dashed line), the unbound end shrinks to 
a minimal value and increases again, as the melting temper- 
ature is approached. Monte Carlo simulation data (circles) 
agrees well with the analytical result. In the N —> oo limit, 
the overhang length diverges below T c and is of order 1 for 
T > T c . Bottom: The fraction of bound basepairs 6 as a func- 
tion of temperature. For periodic sequences with c = 2.15, 8 
vanishes with zero slope, whereas a random sequence shows 
a first order phase transition. When increasing c to 3.15, the 
periodic sequence displays a similar first order transition. 



length of the single-stranded overhang. The fraction 9 of 
bound base pairs is calculated from the free energy per 
length of the helical region as 



g dfbjyo) 
T dq 



(6) 



To obtain the overhang length, we note that the right 
hand side of 10 decomposes the total length M of the 
upper strand into two contributions, where the first term 
is the expected overhang length and the second term cor- 
responds to the number of bases in the helical region. The 
dashed line in Fig. (top) shows the overhang length as 
a function of temperature, for N — 1000 and M = 1150. 
At low temperatures, the two DNA strands are com- 
pletely aligned, so that all M — N excess bases of the 
longer strand form an overhanging end. However, we ob- 
serve that the overhang length decreases with increasing 
temperature, dropping almost to zero before it rises again 
sharply at even higher temperature. We see in Fig. [31 
(bottom) that this drop occurs in a temperature range 
where almost all possible base pairs are formed, and the 
rise occurs when the two strands separate. These ob- 
servations suggest that a temperature-driven conforma- 
tional transition occurs before the melting transition. 

This transition is in fact completely analogous to BEC, 
as Eq. © parallels the behavior of the equation of state 
for an ideal Bose gas: If we divide Eq. JSJ by our system 
size N and introduce the "particle density" a = M/N, 
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FIG. 4: Left: The fugacity yo vs. T for different system sizes 
N . In the thermodynamic limit, yo = s _1 for T < T c . As for 
BEC, yo approaches its limiting value as s _1 — yo ~ 1/N. 
Right: Phase diagram of periodic DNA. At low tempera- 
tures, both strands are completely aligned and excess bases of 
the longer strand form an unbound end. In the intermediate 
phase, all excess bases are absorbed into the helical region. 



we obtain 



J_ (j>c-i{syo) 



(7) 



where a(yo) — — a -^^°- > > 1 is the density inside the 
helical region. In Eq. 0, the first term on the right 
hand side corresponds to the occupation of the ground 
state of an ideal Bose gas, whereas a(yo) is analogous to 
the occupation of the excited states. The fugacities of a 
Bose gas and our DNA are bounded: for the former, by 
the energy of the ground state, and for the DNA by the 
weight of an unbound monomer, i.e. yo < s . The pop- 
ulation of the excited states increases monotonically with 
the fugacity, and attains a finite maximal value, in our 
case a max = a(s _1 ) (provided the loop exponent c > 2). 
If the density a exceeds this maximal value the length of 
the unbound end has to diverge in order to accommodate 
the remaining bases. In other words, the length of the 
unbound end becomes extensive. In an analogous way, 
the ground state of a Bose gas is macroscopically popu- 
lated at low temperatures. In this "condensate" phase, 
the fugacity is locked to the value s _1 in the thermody- 
namic limit (N, M — * oo, a — const.). The deviation for 
finite systems scales as s _1 — y ~ 1/N, see Fig. 0] (left). 
In the opposite case, where a < a max , there is a solution 
to Eq. Q with yo < s^ 1 and the unbound end remains 
finite for all system sizes. 

It is easily shown that ot max approaches 1 at low tem- 
peratures, and consequently all excess bases of the longer 
strand are condensed in the overhang, as illustrated in 
Fig- El (right) . As T increases, more and more bases are 
absorbed in the helical region (a max increases), and the 
system enters the intermediate phase at T = T c , where 
ctmax = a. At T c the condensate fraction vanishes, as 
the solid line shows in Fig. [3] (top). If T is raised to the 
melting temperature T m , which is independent of a, the 
strands separate and 9 vanishes (denatured phase). Note 
that the intermediate phase exists only when a is not too 
large. 

It has been previously predicted 0] that the loop ex- 



4 




FIG. 5: Mutations in the sequence renders the transition to 
the intermediate phase discontinuous. The plot shows Monte 
Carlo data for the length of the unbound end for evenly spaced 
mutations every 100 and 200 bases. This length drops discon- 
tinuously at a critical temperature T c , which depends on the 
mutation density and the binding strength of mutated base- 
pairs. 

ponent c is effectively reduced by one for periodic se- 
quences compared to the standard PS-model with native 
base pairs only. This prediction is explicitly confirmed 
by our exact calculation of the free energy. We find [l7| , 
that there is no melting transition if c < 2, that the 
transition is continuous if 2 < c < 3 and of first order if 
c> 3. For 2 < c < 3,we obtain 9 ~ \T-T m \%=%, using 
the same method as |20j for the standard PS-model. To 
illustrate this, we plot 9 for periodic sequences and for 
the standard PS-model in Fig. |2| (bottom). Whereas for 
the latter 9 drops discontinuously to zero for c = 2.15, 9 
of periodic DNA vanishes with zero slope. Only after in- 
creasing c artificially to 3.15 does periodic DNA exhibits 
a similar first order transition. 

Weak sequence disorder. — Is the intermediate phase 
identified above robust against sequence disorder? To 
address this question, we replace a small fraction of base 
pairs by bases that can pair with each other, but not 
with other bases in the sequence. Fig. 5 shows the av- 
erage length of the overhanging end, obtained by Monte 
Carlo simulation, for evenly spaced mutations with den- 
sities 0.005 and 0.01 and mutation strength Sb = 2 . The 
plot suggests that in the presence of weak sequence dis- 
order the transition described above remains, but is of 
first order instead of being continuous. The unbound 
end keeps its ground state length up to certain temper- 
ature, and then shortens rapidly. This is readily under- 
stood, when comparing the energy barriers for forming 
bulgeloops with and without mutations. The formation 
of a bulgeloop on the longer strand of a perfectly peri- 
odic molecule requires only the initiation energy ei. In 
the presence of mutation, however, shifting both strands 
breaks mutated basepairs. Hence, to form a bulge loop, 
all mutations to the left of the loop have to be broken 
and the energy barrier for loop formation grows with the 
distance from the end. Due to this extensive energy bar- 
rier for loop formation, mutated basepairs stay bound in 
a finite temperature range. For a sufficiently low density 
of mutations, there is a temperature T c , at which the en- 
tropy gained by distributing excess bases in loops along 
the molecule outweighs the energetic costs [2l|. Below 



T c all mutations are bound, if T > T c as many mutations 
open, as are necessary to absorb all excess bases. On in- 
creasing the mutation density, T c approaches the melting 
temperature and the intermediate phase vanishes. 

Discussion. — We have identified a BEC-like confor- 
mational transition in periodic and nearly periodic DNA, 
which occurs below the melting transition. This transi- 
tion leads to a change in the contour length of the DNA 
molecule, which is roughly proportional to M — N. We 
also expect an effect on the persistence length of the he- 
lical region due to the increased density of bulgeloops. 
The hallmark of the transition, i.e. the shortening of 
the unbound end, could be directly observed by resonant 
energy transfer between fluorescent dyes located at the 
ends of the two strands. We expect the existence of the 
intermediate phase to be independent of the details of 
our model. Furthermore, we have shown that the ad- 
ditional conformations possible for repetitive sequences 
change the critical behavior at the melting transition. 
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